Skip to content

[HIPDNN] Tolerance lookup followup#8752

Draft
bghimireamd wants to merge 19 commits into
users/bghimire/ALMIOPEN-1969/tolerance_lookupfrom
users/bghimire/ALMIOPEN-1969/tolerance_lookup_followup
Draft

[HIPDNN] Tolerance lookup followup#8752
bghimireamd wants to merge 19 commits into
users/bghimire/ALMIOPEN-1969/tolerance_lookupfrom
users/bghimire/ALMIOPEN-1969/tolerance_lookup_followup

Conversation

@bghimireamd

Copy link
Copy Markdown
Contributor

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

bghimireamd and others added 19 commits June 22, 2026 13:25
…oading

Bundle discovery scans integration_test_bundles/ for .meta.json files,
derives GTest names from folder structure, and eagerly loads graph +
tensor data. IntegrationTestBundle models bundles as facets with optional
golden outputs. BundleRegistration wires discovered bundles into GTest.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…model

Verification modes (auto|golden|gpu|cpu) control what engine output is
compared against. The harness extracts golden outputs before execution,
runs references via ReferenceCapabilityError-aware adapters, and reports
unverifiable bundles. Includes CLI flags, env fallbacks, and unit tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Graph-only bundles without input data get their inputs synthesized
per-op via free functions dispatched by node attribute type.
RoleAccounting enforces that every owned leaf input is accounted for
(FREE/STRUCTURED/DERIVED) — unaccounted inputs refuse the bundle.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Rename FillOutcome → SynthesisResult and RoleAccounting → SynthesisTracker
  for clarity: SynthesisTracker tracks per-node input role declarations,
  SynthesisResult reports whether synthesis succeeded.
- Rename RoleAccounting.hpp → SynthesisTracker.hpp to match the class name.
- Expand input synthesis from 3 ops to all 19 supported attribute types
  (conv fwd/bwd/wrw, batchnorm inference/variance/training/backward,
  matmul, pointwise, reduction, layernorm fwd/bwd, rmsnorm fwd/bwd,
  resample, block-scale dequantize/quantize, sdpa fwd/bwd).
- Remove single-node graph restriction — harness now loops over all nodes
  in a graph, supporting fused graphs (e.g. conv+bias+relu).
- Improve comments throughout for reviewer clarity: explain what "owns"
  means, document the 4-step fill pattern, add per-function comments
  explaining why specific inputs are STRUCTURED or DERIVED.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move all method implementations out of the 831-line header-only harness
into a .cpp file (~700 lines), leaving the header as declarations only
(~210 lines). The harness is an internal test executable with virtual
methods — no reason for it to be header-only.

Add 9 unit tests for SynthesisTracker covering: all-FREE success,
undeclared inputs, STRUCTURED/DERIVED refusals, uid-0 skip, non-owned
uid skip, empty owned set, mixed failures, and factory methods.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Make the harness's mode dispatch testable by reading the verification
mode through a protected virtual rather than directly from the TestConfig
singleton. Add 10 unit tests covering every branch: auto mode with and
without golden data, GPU→CPU fallback chain, golden/gpu/cpu explicit
modes, and capability-miss skip paths.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…clang-tidy

- Track gpuRefErrored in runAutoMode() so the skip message distinguishes
  GPU-ref-errored from GPU-ref-capability-miss when CPU ref also misses.
- Extract resolveVerificationMode() and resolveGoldenDataDir() as free
  functions (CLI > env > nullopt precedence) and add 6 unit tests.
- Fix readability-implicit-bool-conversion in SynthesizeInputs.hpp:
  change all if(!ptr) to if(ptr == nullptr).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Previously each fill function created its own SynthesisTracker with the
whole-graph leaf input UIDs, so finish() rejected any UID not declared
by that single node. For multi-node/fused graphs (e.g. conv+relu), the
second node's finish() saw the first node's UIDs as "no role declared"
and returned unsupported → SKIP.

Fix: create one SynthesisTracker in synthesizeInputs(), pass it through
all fill functions, call finish() once after all nodes have declared
their UIDs. Each node now only accounts for its own inputs, and the
final finish() verifies that the union covers all leaf inputs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…-optional, runEngineOrSkip

- Fill output tensors with NaN sentinel instead of zero so allClose's
  NaN guard catches unwritten elements (previously masked as computed zero)
- Mark FP8 descale/scale factors as STRUCTURED (refuse to fabricate random
  values that cause identical saturation → vacuous pass), matching
  BlockScaleDequantize precedent
- Make metadata mandatory only when golden outputs are present; graph-only
  bundles default-construct empty metadata instead of failing with
  MISSING_METADATA
- Extract runEngineOrSkip() helper to deduplicate engine-run preamble
  across all three verification modes
- Thread engine error message into skip reason (was silently discarded)
- Make applyMetadataGuards() virtual so unit tests can override the
  hardware-guard path without initializing TestConfig singleton
- Fix test name lint failures (suite/case naming convention)
- Add SynthesisTracker precondition documentation
- Add synthesizeInputs() phase documentation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ref path

- Remove uid!=0 guard from SynthesisTracker::isOwned() — uid=0 is a
  valid tensor uid when the graph declares it as a real leaf input.
  The "absent optional" convention (uid 0 = not connected) is the
  fill function's responsibility, not the tracker's.
- Guard ref-path device allocation on _requiresDevice so CI runners
  without a GPU don't trigger std::bad_alloc from rawDeviceData().

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ghimire/ALMIOPEN-1970/golden_reference_fallback
deriveTestName() threw when a graph .json sat at the data root with no
folder above it. That blocked the natural "--gd .../graph_only_bundle"
invocation, where the user points the data root straight at a single
bundle's folder. Use the root folder's own name as the suite name in
that case instead of throwing, so the bundle is discovered as
{folder}.{stem}.

Update the unit test (JsonAtRootThrows -> JsonAtRootUsesFolderNameAsSuite)
to assert the new behavior.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Extract shared TomlGuards.hpp and wire all three harnesses through it.
Fix toleranceForNodeAttributes missing SdpaAttributes/SdpaBackwardAttributes.
Add SynthesisTracker::fillComputed + tensorAt for follow-up init unification.
Rename appendTensorDiff/appendFpDiff to writeTensorDiffReport/writeFpDiffReport.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…-init unification branch)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ards calls

Both harnesses now use the same shared free functions (skipIfTomlMatched,
applyTomlToleranceOverride) from TomlGuards.hpp. The TOML lookup logic is
already unit-tested in TestTestSettings.cpp.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Harness unification (ALMIOPEN-1969 follow-up):

- Share input synthesis between both harnesses: move SynthesisTracker
  and SynthesizeInputs to harness/input_init/ (namespace
  hipdnn_integration_tests). The non-golden harness now drives its
  initializeBundle() through the same synthesis switch, with a random
  [-1,1] fallback. Remove 4 of 7 non-golden initializeBundle overrides
  whose ranges/seeds now match the shared fill functions; 3 remain
  (fused-graph range conflicts and a large-values stress test).

- Rename harness/golden/ -> harness/bundle/ and namespace
  hipdnn_integration_tests::golden -> ::bundle. "golden" was inaccurate:
  the harness verifies bundles via golden data OR a GPU/CPU reference,
  and most bundles ship no golden output. Rename the class
  IntegrationGraphGoldenReferenceVerificationHarness ->
  IntegrationBundleVerificationHarness (parallel to its sibling
  IntegrationGraphVerificationHarness). Fix stale "golden" log/report
  strings that mislabeled all bundles or shared comparison paths. The
  golden verification *mode* and golden output *data* vocabulary, and
  the --golden-data-dir CLI flag, are retained (accurate / external).

- Drop redundant hipInit/hipGetDevice/_deviceId from the graph harness:
  HIP initializes lazily and getSharedHandle()->hipdnnCreate() already
  does it before any graph runs; _deviceId was write-only.

- Split bundle metadata guards: the VRAM check applies to every bundle
  the engine runs, but the GPU-arch lock only matters when comparing
  against golden output values (arch-specific). Gate the arch check on
  hasGoldenOutputs so an inputs-only bundle verified against a local
  reference is not wrongly skipped on a different arch.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Both harnesses resolved comparison tolerance with duplicated, divergent
code. Replace both with one shared resolver
(harness/tolerance/ToleranceResolver.hpp) where the aggregation policy is a
plain function (GraphWrapper, dtype) -> float, selectable per caller.

What changed:
- New resolveTolerance(wrapper, dtype, testName, atol, rtol, policy=max):
  derives a default via the chosen aggregation policy, then applies the TOML
  per-test override (highest priority) in ONE place. Single tolerance entry
  point for both harnesses; the override is no longer applied separately in
  either, so the layering order lives here alone.
- Two policies, each a free function (no enum/switch; the policy IS the
  function, C++17 function pointer):
    * maxAcrossNodes  — loosest per-node tolerance; conservative envelope,
      never tighter than any node so it cannot cause a false failure.
    * outputOpTolerance — tolerance of the last non-Pointwise op (the
      output-producing op); reproduces the graph harness's historical
      getTolerance() behavior.
- Wiring preserves existing behavior on both sides:
    * Graph harness getTolerance() -> outputOpTolerance (its prior policy),
      now expressed through the shared resolver instead of a private
      dynamic_cast switch. It also serializes via to_binary() and reads the
      output dtype from the flatbuffer, so it shares the resolver keyed on the
      same representation the bundle harness uses.
    * Bundle harness -> maxAcrossNodes (its prior policy), unchanged.
  The two policies agree for the common one-real-op + activation case
  (activation is Pointwise -> skipped; the single real op is both loosest and
  last), so they differ only on multi-real-op fusions, which carry explicit
  overrides today.
- Deletes the duplicate per-op tolerance switch that existed in each harness
  (one on dynamic_cast of live nodes, one on the flatbuffer NodeAttributes
  enum); both now share one flatbuffer-keyed lookup.
- warnIfMultipleOutputs() guards the single-output assumption: every current
  policy reduces over the whole graph, not the per-output subgraph, so
  multi-output graphs are flagged loudly (deferred fix, ALMIOPEN-2216).

Why output-op for the graph harness (not max): a mechanical migration of the
C++ graph tests should not change their tolerances. outputOpTolerance keeps the
exact prior numbers. Note this is a heuristic, not a principled tight bound
(it attributes the whole output tolerance to one op); it is kept for migration
parity. The principled tighten path is the future DynamicTolerances upgrade
(see TODO in ToleranceResolver.hpp and ALMIOPEN-2216); FP4 also needs sub-bf16
entries in the dtype switch (currently falls through to 1e-3).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@bghimireamd bghimireamd changed the base branch from develop to users/bghimire/ALMIOPEN-1969/tolerance_lookup June 24, 2026 05:00
@bghimireamd bghimireamd force-pushed the users/bghimire/ALMIOPEN-1969/tolerance_lookup branch 4 times, most recently from 67c0a2b to e8ff12f Compare June 26, 2026 19:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant